NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Automated Program Repair: Emerging Trends Pose and Expose Problems for Benchmarks

https://doi.org/10.1145/3704997

Renzullo, Joseph; Reiter, Pemma; Weimer, Westley; Forrest, Stephanie (March 2025, ACM Computing Surveys)

Machine learning (ML) pervades the field of Automated Program Repair (APR). Algorithms deploy neural machine translation and large language models (LLMs) to generate software patches, among other tasks. But, there are important differences between these applications of ML and earlier work, which complicates the task of ensuring that results are valid and likely to generalize. A challenge is that the most popular APR evaluation benchmarks were not designed with ML techniques in mind. This is especially true for LLMs, whose large and often poorly-disclosed training datasets may include problems on which they are evaluated. This article reviews work in APR published in the field’s top five venues since 2018, emphasizing emerging trends in the field, including the dramatic rise of ML models, including LLMs. ML-based articles are categorized along structural and functional dimensions, and a variety of issues are identified that these new methods raise. Importantly, data leakage and contamination concerns arise from the challenge of validating ML-based APR using existing benchmarks, which were designed before these techniques were popular. We discuss inconsistencies in evaluation design and performance reporting and offer pointers to solutions where they are available. Finally, we highlight promising new directions that the field is already taking.
more » « less
Full Text Available
Automatically Mitigating Vulnerabilities in Binary Programs via Partially Recompilable Decompilation

https://doi.org/10.1109/TDSC.2024.3482413

Reiter, Pemma; Tay, Hui Jun; Weimer, Westley; Doupé, Adam; Wang, Ruoyu; Forrest, Stephanie (May 2025, IEEE Transactions on Dependable and Secure Computing)

PRD lifts suspect binary functions to source, available for analysis, revision, or review, and creates a patched binary using source- and binary-level techniques. Al- though decompilation and recompilation do not typically succeed on an entire binary, our approach does because it is limited to a few functions, such as those identified by our binary fault localization.
more » « less
Full Text Available
The Evolution of Automated Software Repair

https://doi.org/10.1109/TSE.2025.3533309

Le_Goues, Claire; Nguyen, ThanhVu; Forrest, Stephanie; Weimer, Westley (January 2025, IEEE Transactions on Software Engineering)

Full Text Available
Towards a Cognitive Model of Dynamic Debugging: Does Identifier Construction Matter?

https://doi.org/10.1109/TSE.2024.3465222

Hu, Danniell; Santiesteban, Priscila; Endres, Madeline; Weimer, Westley (November 2024, IEEE Transactions on Software Engineering)

Debugging is a vital and time-consuming process in software engineering. Recently, researchers have begun using neuroimaging to understand the cognitive bases of programming tasks by measuring patterns of neural activity. While exciting, prior studies have only examined small sub-steps in isolation, such as comprehending a method without writing any code or writing a method from scratch without reading any already-existing code. We propose a simple multi-stage debugging model in which programmers transition between Task Comprehension, Fault Localization, Code Editing, Compiling, and Output Comprehension activities. We conduct a human study of n=28 participants using a combination of functional near-infrared spectroscopy and standard coding measurements (e.g., time taken, tests passed, etc.). Critically, we find that our proposed debugging stages are both neurally and behaviorally distinct. To the best of our knowledge, this is the first neurally-justified cognitive model of debugging. At the same time, there is significant interest in understanding how programmers from different backgrounds, such as those grappling with challenges in English prose comprehension, are impacted by code features when debugging. We use our cognitive model of debugging to investigate the role of one such feature: identifier construction. Specifically, we investigate how features of identifier construction impact neural activity while debugging by participants with and without reading difficulties. While we find significant differences in cognitive load as a function of morphology and expertise, we do not find significant differences in end-to-end programming outcomes (e.g., time, correctness, etc.). This nuanced result suggests that prior findings on the cognitive importance of identifier naming in isolated sub-steps may not generalize to end-to-end debugging. Finally, in a result relevant to broadening participation in computing, we find no behavioral outcome differences for participants with reading difficulties.
more » « less
Full Text Available
Self-organization in computation and chemistry: Return to AlChemy

https://doi.org/10.1063/5.0207358

Mathis, Cole; Patel, Devansh; Weimer, Westley; Forrest, Stephanie (September 2024, Chaos: An Interdisciplinary Journal of Nonlinear Science)

How do complex adaptive systems, such as life, emerge from simple constituent parts? In the 1990s, Walter Fontana and Leo Buss proposed a novel modeling approach to this question, based on a formal model of computation known as the λ calculus. The model demonstrated how simple rules, embedded in a combinatorially large space of possibilities, could yield complex, dynamically stable organizations, reminiscent of biochemical reaction networks. Here, we revisit this classic model, called AlChemy, which has been understudied over the past 30 years. We reproduce the original results and study the robustness of those results using the greater computing resources available today. Our analysis reveals several unanticipated features of the system, demonstrating a surprising mix of dynamical robustness and fragility. Specifically, we find that complex, stable organizations emerge more frequently than previously expected, that these organizations are robust against collapse into trivial fixed points, but that these stable organizations cannot be easily combined into higher order entities. We also study the role played by the random generators used in the model, characterizing the initial distribution of objects produced by two random expression generators, and their consequences on the results. Finally, we provide a constructive proof that shows how an extension of the model, based on the typed λ calculus, could simulate transitions between arbitrary states in any possible chemical reaction network, thus indicating a concrete connection between AlChemy and chemical reaction networks. We conclude with a discussion of possible applications of AlChemy to self-organization in modern programming languages and quantitative approaches to the origin of life.
more » « less
Full Text Available
Causal Relationships and Programming Outcomes: A Transcranial Magnetic Stimulation Experiment

https://doi.org/10.1145/3597503.3639096

Ahmad, Hammad; Endres, Madeline; Newman, Kaia; Santiesteban, Priscila; Shedden, Emma; Weimer, Westley (April 2024, ACM)

Understanding the relationship between cognition and programming outcomes is important: it can inform interventions that help novices become experts faster. Neuroimaging techniques can measure brain activity, but prior studies of programming report only correlations. We present the first causal neurological investigation of the cognition of programming by using Transcranial Magnetic Stimulation (TMS). TMS permits temporary and noninvasive disruption of specific brain regions. By disrupting brain regions and then measuring programming outcomes, we discover whether a true causal relationship exists. To the best of our knowledge, this is the first use of TMS to study software engineering. Where multiple previous studies reported correlations, we find no direct causal relationships between implicated brain regions and programming. Using a protocol that follows TMS best practices and mitigates for biases, we replicate psychology findings that TMS affects spatial tasks. We then find that neurostimulation can affect programming outcomes. Multi-level regression analysis shows that TMS stimulation of different regions significantly accounts for 2.2% of the variance in task completion time. Our results have implications for interventions in education and training as well as research into causal cognitive relationships.
more » « less
Full Text Available
CirFix: Automated Hardware Repair and Its Real-World Applications

https://doi.org/10.1109/TSE.2023.3269899

Santiesteban, Priscila; Huang, Yu; Weimer, Westley; Ahmad, Hammad (April 2023, IEEE Transactions on Software Engineering)

This article presents CirFix, a framework for automatically repairing defects in hardware designs implemented in languages like Verilog. We propose a novel fault localization approach based on assignments to wires and registers, and a fitness function tailored to the hardware domain to bridge the gap between software-level automated program repair and hardware descriptions. We also present a benchmark suite of 32 defect scenarios corresponding to a variety of hardware projects. Overall, CirFix produces plausible repairs for 21/32 and correct repairs for 16/32 of the defect scenarios. Additionally, we evaluate CirFix's fault localization independently through a human study (n=41), and find that the approach may be a beneficial debugging aid for complex multi-line hardware defects.
more » « less
Full Text Available
How Do We Read Formal Claims? Eye-Tracking and the Cognition of Proofs about Algorithms

https://doi.org/10.1109/ICSE48619.2023.00029

Ahmad, Hammad; Karas, Zachary; Diaz, Kimberly; Kamil, Amir; Jeannin, Jean-Baptiste; Weimer, Westley (May 2023, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE))

Full Text Available
Digging into Semantics: Where Do Search-Based Software Repair Methods Search?

https://doi.org/10.1007/978-3-031-14721-0_1

Ahmad, Hammad; Cashin, Padraic; Forrest, Stephanie; Weimer, Westley (September 2022, Parallel Problem Solving from Nature)

Full Text Available
Seq2Parse: neurosymbolic parse error repair

https://doi.org/10.1145/3563330

Sakkas, Georgios; Endres, Madeline; Guo, Philip J.; Weimer, Westley; Jhala, Ranjit (October 2022, Proceedings of the ACM on Programming Languages)

We present Seq2Parse, a language-agnostic neurosymbolic approach to automatically repairing parse errors. Seq2Parse is based on the insight that Symbolic Error Correcting (EC) Parsers can, in principle, synthesize repairs, but, in practice, are overwhelmed by the many error-correction rules that are not relevant to the particular program that requires repair. In contrast, Neural approaches are fooled by the large space of possible sequence level edits, but can precisely pinpoint the set of EC-rules that are relevant to a particular program. We show how to combine their complementary strengths by using neural methods to train a sequence classifier that predicts the small set of relevant EC-rules for an ill-parsed program, after which, the symbolic EC-parsing algorithm can make short work of generating useful repairs. We train and evaluate our method on a dataset of 1,100,000 Python programs, and show that Seq2Parse is accurate and efficient : it can parse 94% of our tests within 2.1 seconds, while generating the exact user fix in 1 out 3 of the cases; and useful : humans perceive both Seq2Parse-generated error locations and repairs to be almost as good as human-generated ones in a statistically-significant manner.
more » « less
Full Text Available

« Prev Next »

Search for: All records